Exponential random graph models

SNA Week 10

Matt Pietryka

Problem 1: Relational (Dyadic) Nonindependence

\[ L(\mathbf{\theta} |\mathbf{y}, \mathbf{X}) = \prod_{i = 1}^n f(y_i|\mathbf{x_i}, \mathbf{\theta}) \]

  • Violation of an axiom of probability if observations are not conditionally i.i.d.
  • So, two observations have the same DGP if they match on covariates
  • But it is impossible by construction/assumption that observations can influence each other endogenously
  • Substantively invalid in networks
  • With regression on relational data, effects of the relational structure may be falsely attributed to the covariates
  • This perilously raises the probability of faulty inference

Problem 2: Data Multiplication

  • This is a problem at the (common) dyadic level
  • If there are \(n\) actors in the system, there are \({n}\choose{2}\) relational dyads
  • If the dyads are directed, there are \(2\)\({n}\choose{2}\) dyads
  • If there are 150 actors in the system, there are 11,175 dyads for that point in time
  • Example: if 150 states are observed between 1816 and 2010, there are 2,167,950 dyads
  • If those dyads are directed, there are 4,335,900 dyads

Problem 2: Data Multiplication

Data multiplication produces two consequences:

  • Problem 2A: p-values are strongly related to sample size
    • Higher N results in lower p
    • Artificial inflation of N is akin to artificial shrinkage of p
    • So we need to worry about the reliability of dyadic findings
  • Problem 2B: Nonsensical treatment of multiparty events
    • Should WWI and WWII really account for 46% of the wars since 1816?
    • 53% of the wars in the 20th century?
    • Were WWI and WWII really 567 entirely independent dyadic wars?

Why use ERGMS

  1. Test hypotheses
    • Example: Does the Krackhardt network exhibit reciprocity? 2.Simulation for theoretical exploration
    • Example: How should seats be assigned in a classroom to encourage cross-racial friendships?
  2. Tie prediction
    • Example: Will Canada attack next year?

ERGMs integrate into a single model relationships of two forms

  1. Covariate questions
    • Do legislators in the same political party collaborate more frequently than those in opposite parties?
    • Do states with democratic governments have more alliances than those with autocratic regimes?
  2. Interdependence questions
    • Are two states at war with the same third state less likely to be at war with each other?
    • Are there popularity effects in the choice of co-authors?

ERGM notation

  • Network \(N\) is composed of \(n\) nodes
  • Assume only binary edges between nodes \(i\) and \(j\) , \(N_{ij} = {0, 1}\) (relaxable)
  • \(\mathcal{N}\) possible permutations of \(N\)
  • Modeling objective: the probability of observing the network we did observe, \(\mathcal{P}(N)\), over the networks we could have observed
  • If there are \(\mathcal{N}\) possible networks with the same number of nodes, then it must be $N \(\mathcal{N}\) and we want to know p(N)

The probability (likelihood function) of observing network \(N\) is: \[ \mathcal{P}(N, \theta) = \frac{exp(\mathbf{\theta'h}(N))}{\sum_{N^*\in\mathcal{N}}exp(\mathbf{\theta'h}(N^*))} \]

  • \(\mathbf{h}(N)\) = network statistics
    • Flexible: \(\mathbf{h}\) can model virtually any form of interdependence or covariates
  • \(\theta\) = coefficients
  • \(\sum_{N^*\in\mathcal{N}}exp(\mathbf{\theta'h}(N^*))\) = Normalizing constant
    • Normalizing constant can make estimation difficult

\[ \mathcal{P}(N, \theta) = \frac{exp(\mathbf{\theta'h}(N))}{\sum_{N^*\in\mathcal{N}}exp(\mathbf{\theta'h}(N^*))} \]

  • Conceptual trick: treat a network as a single multivariate observation, rather than a large number of relational observations
  • If the whole network is only one observation, no need for i.i.d. assumptions
  • The model can now accommodate (almost) arbitrarily complex relationships

ERGM makes two big assumptions

Given a set of network statistics:

  1. There is equal probability of observing any two networks with the same values of those statistics
    • Assumes the model is completely and correctly specified
  2. The observed network exhibits the average value of those statistics over the networks that could have been observed
    • No different from the assumption that the average relationships in a dataset are representative of the population (implicit in GLMs)

Limitations of (Basic) ERGM

  • Challenges in estimation
  • The ERGM cannot be used to model longitudinally observed networks
    • BUT: Read next week about the Temporal ERGM
  • The ERGM cannot accommodate non-binary networks

Estimating an ERGM

  • You must:
    1. Conceptually define dependencies that should/might exist in the network
    2. Define empirical measures of those dependencies (i.e., \(\mathbf{h}(N)\))
  • The ERGM software will then:
    1. Find most likely set of \(\theta\).
    2. Simulate networks so you can check model fit

Define \(h(N)\) for Exogenous Covariates

Generally, a statistic that captures the relationship between X and N is \[ h_D(N,X) = \sum_{ij}N_{ij}X_{ij} \]

  • Naturally accommodates dyad-level covariates
  • Node level covariates must be “scaled up”:
    • \(X_{ij} = Z_i\) for node-level covariate \(Z\)
  • Interpretation: If \(X\) has a positive relationship w/ edge formation, then \(h(N)\) will be higher than if the covariate values were randomly assigned to edges, controlling for the other \(h(N)\)s on the network configuration

Define \(h(N)\) for Endogenous Covariates

  • How would we measure reciprocity?
  • A statistic we would expect to be high if ties were reciprocated a lot and low if they were not reciprocated
  • \(h_R(N) = \sum_{i <j}N_{ij}N_{ji}\)

Define \(h(N)\) for Endogenous Covariates

  • Popularity
    • \(h_P(N) = \sum_{i,j,k}N_{ji}N_{ki} + N_{kj}N_{ij} + N_{ik}N_{jk}\)
  • Sociality
    • \(h_S(N) = \sum_{i,j,k}N_{ij}N_{ik} + N_{jk}N_{ji} + N_{ki}N_{kj}\)

Define \(h(N)\) for Endogenous Covariates

  • Transitivity
    • \(h_T(N) = \sum_i\sum_{i \neq j, k}N_{ij}N_{ik}N_{jk}\)

Network Level ERGM Interpretation

The relative likelihood of observing \(N^{j+}\) to observing \(N^{j-}\) is exp(\(\theta_j\)), where

  • \(\theta_j\) is the coefficient associated with statistic \(j\)
  • \(N^{j+}\) is one unit greater than \(N^{j-}\) on statistic \(j\) (e.g., \(N^{j+}\) has one more closed triangle than \(N^{j-}\)), holding all other statistics constant

Edge Level ERGM Interpretation

\[ P(N_{ij} = 1|N\__{ij}, \theta) = \text{logit}^{-1}(\sum_{r=1}^k \theta_r\delta_r^{ij}(N)) \]

  • \(N\__{ij}\) indicates the network excluding \(N_{ij}\)
  • \(\delta_r^{ij}(N)\) is equal to the change in \(h(N)\) when \(N_{ij}\) is changed from zero to one
  • \(\text{logit}^{-1}(x) = 1/(1 + \text{exp}(-x))\)

Estimating the ERGM in R

Load the data

Source: http://michaellevy.name/blog/ERGM-tutorial/#ergm-output

Example: Samuel F. Sampson recorded the social interactions among a group of monks.

library(tidyverse)
library(statnet)
library(xergm)

# ALWAYS SET YOUR SEED
set.seed(1)


# LOAD Sampson's Monastery Study
data('sampson')
n <- samplike
n
##  Network attributes:
##   vertices = 18 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   total edges= 88 
##     missing edges= 0 
##     non-missing edges= 88 
## 
##  Vertex attribute names: 
##     cloisterville group vertex.names 
## 
##  Edge attribute names: 
##     nominations

Examine node attributes

  • cloisterville: Some novices had attended the minor seminary of ‘Cloisterville’ before they came to the monastery
  • group : Sampson divided novices into four groups based on their role in a political crisis within the cloister that occurred during his stay
# EXAMINE NODE ATTRIBUTES
n %v% "cloisterville"
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
  ## Some novices had attended the minor seminary of 'Cloisterville' before they came to the monastery

n %v% "group"
##  [1] "Turks"    "Turks"    "Outcasts" "Loyal"    "Loyal"    "Loyal"   
##  [7] "Turks"    "Loyal"    "Loyal"    "Loyal"    "Loyal"    "Turks"   
## [13] "Outcasts" "Turks"    "Turks"    "Turks"    "Outcasts" "Outcasts"

PLOT THE NETWORK

par(mar = c(0, 0, 0, 0))
plot(n  ,
  displaylabels = TRUE  ,
  vertex.cex = degree(n, cmode = 'indegree') / 2  ,
  vertex.col = 'group'  ,
  vertex.sides = ifelse(n %v% 'cloisterville', 4, 50)  ,
  pad = 1
)

RUN AN ERGM

# NETWORK OBJECT ON LEFT, TERM(S) ON RIGHT
m1 <- ergm(n ~ edges)
## Evaluating log-likelihood at the estimate.
summary(m1)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   n ~ edges
## 
## Iterations:  4 out of 20 
## 
## Monte Carlo MLE Results:
##       Estimate Std. Error MCMC % p-value    
## edges  -0.9072     0.1263      0  <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 424.2  on 306  degrees of freedom
##  Residual Deviance: 367.2  on 305  degrees of freedom
##  
## AIC: 369.2    BIC: 372.9    (Smaller is better.)

The EDGES TERM SHOWS THE DENSITY (IN LOG-ODDS)

coef(m1)[[1]]  %>% plogis()
## [1] 0.2875817
network.density(n)
## [1] 0.2875817

nodematch(x) explores tendency toward homophily on node attribute x

m2 <- ergm(n ~ edges + nodematch("group"))
## Evaluating log-likelihood at the estimate.
summary(m2)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   n ~ edges + nodematch("group")
## 
## Iterations:  5 out of 20 
## 
## Monte Carlo MLE Results:
##                 Estimate Std. Error MCMC % p-value    
## edges            -2.0015     0.2131      0  <1e-04 ***
## nodematch.group   2.6481     0.3026      0  <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 424.2  on 306  degrees of freedom
##  Residual Deviance: 276.9  on 304  degrees of freedom
##  
## AIC: 280.9    BIC: 288.3    (Smaller is better.)

dyads
## # A tibble: 306 × 10
##        a     b edge_pair_possible  edge     name_a cloisterville_a group_a
##    <int> <int>              <chr> <dbl>      <chr>           <lgl>   <chr>
## 1      1     2                1-2     1 John Bosco            TRUE   Turks
## 2      1     3                1-3     1 John Bosco            TRUE   Turks
## 3      1     4                1-4     0 John Bosco            TRUE   Turks
## 4      1     5                1-5     1 John Bosco            TRUE   Turks
## 5      1     6                1-6     0 John Bosco            TRUE   Turks
## 6      1     7                1-7     0 John Bosco            TRUE   Turks
## 7      1     8                1-8     1 John Bosco            TRUE   Turks
## 8      1     9                1-9     0 John Bosco            TRUE   Turks
## 9      1    10               1-10     0 John Bosco            TRUE   Turks
## 10     1    11               1-11     0 John Bosco            TRUE   Turks
## # ... with 296 more rows, and 3 more variables: name_b <chr>,
## #   cloisterville_b <lgl>, group_b <chr>
dyads <- dyads  %>%  mutate(nodematch.group = ifelse(group_a == group_b, 1, 0))

m2_logit <- glm(edge ~ nodematch.group, data = dyads, family = "binomial")
summary(m2_logit)
## 
## Call:
## glm(formula = edge ~ nodematch.group, family = "binomial", data = dyads)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4614  -0.5035  -0.5035   0.9178   2.0631  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -2.0015     0.2131  -9.393   <2e-16 ***
## nodematch.group   2.6481     0.3026   8.751   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 367.18  on 305  degrees of freedom
## Residual deviance: 276.86  on 304  degrees of freedom
## AIC: 280.86
## 
## Number of Fisher Scoring iterations: 4

Statistical models
ergm logit
edges -2.00***
(0.21)
nodematch.group 2.65*** 2.65***
(0.30) (0.30)
(Intercept) -2.00***
(0.21)
AIC 280.86 280.86
BIC 288.31 288.31
Log Likelihood -138.43 -138.43
Deviance 276.86
Num. obs. 306
p < 0.001, p < 0.01, p < 0.05

RUN an ERGM w/ a term for reciprocity of ties.

Given an i -> j tie, what is the change in log-odds likelihood of a j -> i tie?

m3 <- ergm(n ~ edges + mutual)
## Starting maximum likelihood estimation via MCMLE:
## Iteration 1 of at most 20: 
## The log-likelihood improved by 0.002666 
## Step length converged once. Increasing MCMC sample size.
## Iteration 2 of at most 20: 
## The log-likelihood improved by 0.003867 
## Step length converged twice. Stopping.
## Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
## 
## This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(m3)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   n ~ edges + mutual
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##        Estimate Std. Error MCMC % p-value    
## edges   -1.7647     0.2051      0  <1e-04 ***
## mutual   2.3231     0.4156      0  <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 424.2  on 306  degrees of freedom
##  Residual Deviance: 332.3  on 304  degrees of freedom
##  
## AIC: 336.3    BIC: 343.7    (Smaller is better.)

# BASELINE PROBABILITY OF A TIE
coef(m3)[['edges']]  %>% plogis()
## [1] 0.1462007
exp(coef(m3)[['edges']])/(exp(coef(m3)[['edges']]) + 1)
## [1] 0.1462007
interpret(m3, type = "tie", i = 4, j = 1) # NO TIE FROM 1 -> 4
## [1] 0.1462007
# But if the reciprocal tie is present, then the log odds of the tie is 2.32x greater:
plogis(coef(m3)[['edges']] + coef(m3)[['mutual']])
## [1] 0.6360828
interpret(m3, type = "tie", i = 2, j = 1) # NO TIE FROM 1 -> 2
## [1] 0.6360828
# Probability of both ties occuring simultaneously
interpret(m3,type = "tie",i = c(2, 4), j = 1)
## [1] 0.230354
## Probability of mutual ties.
interpret(m3, type = "dyad", i = 1, j = 11)
##           j->i = 0  j->i = 1
## i->j = 0 0.6090989 0.1042993
## i->j = 1 0.1042993 0.1823024
## Probability of sending to multiple nodes.
interpret(m3, type = "node", i = 11, j = c(1, 14))
##           probability Receiver 1 Receiver 14
## Sender 11  0.31071229          0           0
## Sender 11  0.09299574          1           1
## Sender 11  0.05320495          1           0
## Sender 11  0.54308702          0           1

CHECKING FOR MODEL DEGENERACY

mcmc.diagnostics(m3)

Example of bad chain

mbad <- ergm(n ~ edges + mutual,
            control = control.ergm(MCMC.interval = 2))
mcmc.diagnostics(mbad)

Examining model fit

let’s use the use the gof function to see if the estimates make a good fit to the data.

  • gof simulates networks from the ERGM estimates and, for some set of network statistics, compares the distribution in the simulated networks to the observed values.

m3_gof <- ergm::gof(m3, GOF = ~ model)
m3_gof
## 
## Goodness-of-fit for model statistics 
## 
##        obs min  mean max MC p-value
## edges   88  66 89.61 128       0.98
## mutual  28  17 28.51  45       1.00
  • p-value closer to one is better: This is the difference between the observed networks and simulations from the model

par(mfrow = c(2, 2))
m3_gof_v2 <- ergm::gof(m3)
plot(m3_gof_v2)

m3_gof_v3 <-  xergm.common::gof(m3)
plot(m3_gof_v3)

Simulating networks

sim_nets <- simulate(m3, nsim = 3)
sim_nets[[1]]
##  Network attributes:
##   vertices = 18 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   total edges= 71 
##     missing edges= 0 
##     non-missing edges= 71 
## 
##  Vertex attribute names: 
##     cloisterville group vertex.names 
## 
## No edge attributes

CLEAR THAT HOMOPHILY IS PRESENT

# GET OBSERVED NETWORK STATISTIC VALUES
summary(n ~ edges + mutual + nodematch('group'))
##           edges          mutual nodematch.group 
##              88              28              63
  • So of the 88 ties in the network, 28 of them are reciprocal, and 63 of them are between monks within a group.
  • So we want that homophily term in the model ##
m4 <- ergm(n ~ edges + mutual + nodematch('group'))
summary(m4)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   n ~ edges + mutual + nodematch("group")
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##                 Estimate Std. Error MCMC % p-value    
## edges            -2.2656     0.2303      0  <1e-04 ***
## mutual            1.4445     0.4908      0  0.0035 ** 
## nodematch.group   2.0243     0.3157      0  <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 424.2  on 306  degrees of freedom
##  Residual Deviance: 268.2  on 303  degrees of freedom
##  
## AIC: 274.2    BIC: 285.4    (Smaller is better.)
# Probability of a non-reciprocal, across-group tie:
coef(m4)[1]  %>% plogis()
##      edges 
## 0.09401459
# Probability of a non-reciprocal, within-group tie:
coef(m4)[c(1, 3)]  %>% sum()  %>% plogis()
## [1] 0.4399757

Let’s take a look at the goodness of fit of that model:

Two-stars can be used to represent popularity (because the more edges on a node, the more two stars an additional edge will create)

m5 <- ergm(n ~ edges + mutual + nodematch('group') + istar(2))
summary(m5)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   n ~ edges + mutual + nodematch("group") + istar(2)
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##                 Estimate Std. Error MCMC %  p-value    
## edges           -3.63673    0.37732      0  < 1e-04 ***
## mutual           1.50342    0.43991      0 0.000719 ***
## nodematch.group  2.23124    0.33717      0  < 1e-04 ***
## istar2           0.26350    0.04304      0  < 1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 424.2  on 306  degrees of freedom
##  Residual Deviance: 257.7  on 302  degrees of freedom
##  
## AIC: 265.7    BIC: 280.6    (Smaller is better.)

GOF plots:

Model comparison

Is the fit getting better? Looks like it, but hard to say, and there is danger of overfitting here as elsewhere. Can use formal model comparison as with other models:

round(sapply(list(m1, m2, m3, m4, m5), AIC), 0)
## [1] 369 281 336 274 266

geometerically-weighted terms

data('faux.magnolia.high')
magnolia <- faux.magnolia.high
fit_a <- ergm(magnolia ~ edges + triangles)
## Starting maximum likelihood estimation via MCMLE:
## Iteration 1 of at most 20: 
## The log-likelihood improved by 2.386 
## Iteration 2 of at most 20: 
## The log-likelihood improved by 1.511 
## Step length converged once. Increasing MCMC sample size.
## Iteration 3 of at most 20:
## Error in ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL), : Number of edges in a simulated network exceeds that in the observed by a factor of more than 20. This is a strong indicator of model degeneracy or a very poor starting parameter configuration. If you are reasonably certain that neither of these is the case, increase the MCMLE.density.guard control.ergm() parameter.

fit_b <- ergm(magnolia ~ edges + gwesp(0.25, fixed = TRUE))
## Starting maximum likelihood estimation via MCMLE:
## Iteration 1 of at most 20: 
## The log-likelihood improved by 2.671 
## Iteration 2 of at most 20: 
## The log-likelihood improved by 1.434 
## Step length converged once. Increasing MCMC sample size.
## Iteration 3 of at most 20: 
## The log-likelihood improved by 2.121 
## Step length converged twice. Stopping.
## Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
## 
## This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(fit_b)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   magnolia ~ edges + gwesp(0.25, fixed = TRUE)
## 
## Iterations:  3 out of 20 
## 
## Monte Carlo MLE Results:
##                  Estimate Std. Error MCMC % p-value    
## edges            -7.47459    0.04070      0  <1e-04 ***
## gwesp.fixed.0.25  2.27255    0.05577      2  <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1478525  on 1066530  degrees of freedom
##  Residual Deviance:   14391  on 1066528  degrees of freedom
##  
## AIC: 14395    BIC: 14419    (Smaller is better.)

mcmc.diagnostics(fit_b)

fit_c <- ergm(magnolia ~  edges + gwesp(0.25, fixed = TRUE) + nodematch('Grade') +
                nodematch('Race') + nodematch('Sex'),
              control = control.ergm(MCMC.samplesize = 50000, MCMC.interval = 1000)
  )
summary(fit_c)
## 
## ==========================
## Summary of model fit
## ==========================
## 
## Formula:   magnolia ~ edges + gwesp(0.25, fixed = TRUE) + nodematch("Grade") + 
##     nodematch("Race") + nodematch("Sex")
## 
## Iterations:  2 out of 20 
## 
## Monte Carlo MLE Results:
##                  Estimate Std. Error MCMC % p-value    
## edges            -9.78369    0.10446      0  <1e-04 ***
## gwesp.fixed.0.25  1.79766    0.04931      0  <1e-04 ***
## nodematch.Grade   2.74762    0.08710      0  <1e-04 ***
## nodematch.Race    0.90756    0.07335      0  <1e-04 ***
## nodematch.Sex     0.77106    0.06234      0  <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1478525  on 1066530  degrees of freedom
##  Residual Deviance:   12137  on 1066525  degrees of freedom
##  
## AIC: 12147    BIC: 12206    (Smaller is better.)

mcmc.diagnostics(fit_c)

Some network statistics and their geometrically-weighted alternatives

ERGM Term Statistic Substance
istar(k) # k in-stars popularity
ostar(k) # k out-stars sociality
kstar(k) # k stars preferential attachment
gwidegree geom w’ted in-degree popularity
gwodegree geom w’ted out-degree sociality
gwdegree geom w’ted degree preferential attachment

prone to degeneracy

Call ?ergm.terms for all predefined terms

You can create your own ERGM terms

ASSIGNMENT 6

Let’s work with the directed Mad Men relationships network (it’s silly, but it’s relatively easy to work with). First install and load the geomnet package. Then, call ?mm.directed to learn about the data. Then load the data:

library(geomnet)
library(ergm)


# READ IN EDGE AND NODE DATA 
data(mm.directed)
edge_list  <- as.edgedf(mm.directed$edges)
node_data  <- mm.directed$vertices[order(mm.directed$vertices$label), ]

# CREATE NETWORK OBJECT 
n <- network(x = edge_list, matrix.type="edgelist")
# CHECK THAT NAMES ARE IN CORRECT ORDER
stopifnot(n %v% "vertex.names" == as.character(node_data$label))

# ASSIGN VERTEX ATTRIBUTE
n %v% "gender" <- as.character(node_data$Gender)

# RUN BASELINE ERGM
m1 <- ergm(n ~ edges)
## Evaluating log-likelihood at the estimate.

(instructions continue on next page)

ASSIGNMENT 6

With the Mad Men network:

  • Run some ergms (make sure to include one or more endogenous covariates)
    • As a constraint, use no more than 5 terms including edges
  • Choose the best model you can find to report (don’t write about more than one model). Provide a table of the model results. Include the AIC.
    • I’ll give 40 Respect Points to whomever finds the best model. Note: Respect Points have no effect on your final grade in this class.
    • ensure the model you choose is not degenerate (show figures providing evidence)
    • find the best fitting model you can, but it’s okay if the fit is not perfect.
  • Assess your model’s goodness of fit graphically and in writing
  • Interpret in writing each model coefficient You will probably need to do more reading to properly interpret most terms. That’s part of the fun of running ergms.